NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Symmetric Single-Index Learning

Zweig, Aaron; Bruna, Joan (May 2024, International Conference on Learning Representations (ICLR))

Few neural architectures lend themselves to provable learning with gradient based methods. One popular model is the single-index model, in which labels are produced by composing an unknown linear projection with a possibly unknown scalar link function. Learning this model with SGD is relatively well-understood, whereby the so-called information exponent of the link function governs a polynomial sample complexity rate. However, extending this analysis to deeper or more complicated architectures remains challenging. In this work, we consider single index learning in the setting of symmetric neural net- works. Under analytic assumptions on the activation and maximum degree assumptions on the link function, we prove that gradient flow recovers the hidden planted direction, represented as a finitely supported vector in the feature space of power sum polynomials. We characterize a notion of information exponent adapted to our setting that controls the efficiency of learning.
more » « less
Full Text Available
On Single-Index Models beyond Gaussian Data

Bruna, Joan; Pillaud-Vivien, Loucas; Zweig, Aaron (December 2023, Neural Information Processing Systems)

Sparse high-dimensional functions have arisen as a rich framework to study the behavior of gradient-descent methods using shallow neural networks, showcasing their ability to perform feature learning beyond linear models. Amongst those functions, the simplest are single-index models f(x) = φ(x · θ∗), where the labels are generated by an arbitrary non-linear scalar link function φ applied to an unknown one-dimensional projection θ∗ of the input data. By focusing on Gaussian data, several recent works have built a remarkable picture, where the so-called information exponent (related to the regularity of the link function) controls the required sample complexity. In essence, these tools exploit the stability and spherical symmetry of Gaussian distributions. In this work, building from the framework of [Ben Arous et al., 2021], we explore extensions of this picture beyond the Gaussian setting, where both stability or symmetry might be violated. Focusing on the planted setting where φ is known, our main results establish that Stochastic Gradient Descent can efficiently recover the unknown direction θ∗ in the high- dimensional regime, under assumptions that extend previous works [Yehudai and Shamir, 2020, Wu, 2022]
more » « less
Full Text Available
Exponential Separations in Symmetric Neural Networks

Zweig, Aaron; Bruna, Joan (December 2022, Advances in neural information processing systems)

Full Text Available
Provably efficient Third-person imitation from Offline Observation

Zweig, Aaron; Bruna, Joan (July 2020, Uncertainty in artificial intelligence)

Domain adaptation in imitation learning represents an essential step towards improving gen- eralizability. However, even in the restricted setting of third-person imitation where transfer is between isomorphic Markov Decision Processes, there are no strong guarantees on the perfor- mance of transferred policies. We present problem-dependent, statistical learning guarantees for third-person imitation from observation in an offline setting, and a lower bound on performance in the online setting.
more » « less
Full Text Available

Search for: All records